A bug


It appears that both versions of the tool have a bug in the default conversions defined in NativeMappings.hs

They got introduced in a major restructuring that took place before 4.8 and slipped through my regression tests.

I’ll upload a fixed version as soon as possible.

Sorry for any inconvenience

Update: Version 0.5.1 fixes the bug mentioned.

Some more details: originally all my convertion functions were pure, the problem is, some activities like creating a pointer required IO. This is why currently everything is done in IO. When I converted the library I missed some calls, and those calls were giving an error now saying that the convertion function was undefined.

Tracing Lexer memory leaks


One of the problems I’ve been struggling with for a while now is the presence of pesky memory leaks in Visual Haskell. Hs2lib has one convention, It doesn’t free any memory, and so you’re responsible for freeing all memory.

As far as I knew, I was freeing any and all pointers that I had. It should not be leaking, but yet it was. So I decided to get to the root of this problem. I wrote a simple application that uses the Lexer classes of Visual Haskell and would emulate a user scrolling by feeding it lines of a Haskell file one at a time.

Using the Debug Diagnostics Tools I was able to track the application and make a full memory dump every few seconds in order to track the progression of the leak. The results were rather surprising:

WARNING – DebugDiag was not able to locate debug symbols for HsLexer.dll, so the reported function name(s) may not be accurate.

HsLexer.dll is responsible for 11.95 MBytes worth of outstanding allocations. The following are the top 2 memory consuming functions:

HsLexer!HsEnd+1343ac4: 8.20 MBytes worth of outstanding allocations.

HsLexer!HsEnd+150ae89: 2.00 MBytes worth of outstanding allocations.

This was detected in LexerLeakTest.exe__PID__4660__Date__07_04_2011__Time_03_20_30PM__18__Leak Dump – Private Bytes.dmp

So according to this tool, my little program was leaking quite extensively and not surprisingly, it was all coming from my inside my Haskell program. Unfortunately, GHC/GCC can not produce the proper symbols (.pdb files) for any of the Microsoft debugging tools to understand. So while we know conclusively that the program is leaking, we don’t know where.

Hs2lib-Debug

This is where the new version of Hs2lib comes into play. The idea is to track any and all allocations and de-allocations made during the run of the program, in essence a simple profiler.

For this reason we now have Debug versions of the of modules. Through the magic of CPPHS, Cabal and custom preprocessors we get the usual “release” modules and “debug” modules which write the allocations to a file.

I’ll skip over the implementation of that, but the idea is to override all allocation functions with a custom one. The structure which is being written out to disk (the current version uses the rather slow show/read instances generated by GHC. I will be replacing these in the future) is:

<br />
data MemAlloc = MemAlloc { memFun   :: Caller<br />
                         , memStack :: Stack<br />
                         , memStart :: Address<br />
                         , memStop  :: Maybe Address<br />
                         , memSize  :: Maybe MemSize<br />
                         , memTime  :: String<br />
                         }<br />
              | MemFree { memStack :: Stack<br />
                        , memStart :: Address<br />
                        , memSize  :: Maybe MemSize<br />
                        , memTime  :: String<br />
                        }<br />
              deriving (Show, Read)<br />

The allocations get written out to a file called “Memory.dump”.

A piece of one such file is:

<br />
MemAlloc { memFun = Record<br />
         , memStack = Then "WinDll\\Lib\\/NativeMapping_Base.cpphs:242(peekCWString)"<br />
                     (Then "HsLexer.hs:85(fromNative)"<br />
                     (Then "HsLexer.hs:83(lexSourceStringWithExtA)"<br />
                      Empty))<br />
         , memStart = 43309024<br />
         , memStop = Just 43309026<br />
         , memSize = Just 2<br />
         , memTime = "1312779866"<br />
         }<br />
MemFree { memStack = Then "WinDll\\Lib\\/NativeMapping_Base.cpphs:246(freeCWString)"<br />
                    (Then "HsLexer.hs:85(fromNative)"<br />
                    (Then "HsLexer.hs:83(lexSourceStringWithExtA)"<br />
                     Empty))<br />
        , memStart = 43309024<br />
        , memSize = Just 2<br />
        , memTime = "1312779866"<br />
        }<br />

From this you can see that not only does it record allocation, but I’ve also implemented a simple artificial stack, that shows us where the allocation is/originated. The current implementation is rather simplistic. I will look into expanding this later, but for now it suits my needs.

For those wondering, this tracker is not enabled by default. To enable it just pass the “- -debug” flag to the call to Hs2lib. Compiling with this flag instructs the library to use the Debug version of the libraries, and changes the code generators so that they also add extra information to the generated code.

Since de-allocations also have to be tracked, using this flag also exposes a free function which should be used to free pointers. If the CallingConvention  used is stdcall then a function “freeS” is exported, if ccall then “freeC” is exported. The reason for this is because these functions are statically defined. They aren’t being generated by the tool but are instead part of the library part of the tool.

Performing analysis

Once we have a .dump file, the next step is to analyze this information. This is where the new tool Hs2lib-debug comes in. This tool replays the allocations in a pseudo heap. If all goes well, at the end the heap should be empty. If it has any entries it means we’ have a leak.

Invoking it is quite easy, just pass it as an argument the folder which contains the dump file:

Hs2lib-debug.exe -v .\MemDumps

and that’s all.

Running this on the dump file created from the lexer program returned:

*** Program starting up…

*** Reading file ‘.\MemDumps\Memory.dump’…

*** Found 31890 record(s).

*** Analyzing [********************] 100.00%

*** Found 1135 outstanding allocation(s).

1135 unfreed references found originating from HsLexer.hs:85(fromNative)\;\;\WinDll\Lib\/NativeMapping_Base.cpphs:242(peekCWString)\;

*** Cleaning up….

Done.

The messy output for the stack is a as follows: Entries in the path are separated by a ;, instead of the usual \ character.

The function the profiler pointed out is:

<br />
lexSourceStringWithExtA :: StablePtr (IORef [ExtensionFlag]) -&gt; CWString -&gt; IO ((StatelessParseResultPtr (Located Token)))<br />
lexSourceStringWithExtA a1 a2 =<br />
  do let st = newStack (__FILE__ ++ ":" ++ (show __LINE__) ++ "(lexSourceStringWithExtA)")<br />
     a1p &lt;- fromNative (pushStack st (__FILE__ ++ ":" ++ (show __LINE__) ++ "(fromNative)")) a1<br />
     a2p &lt;- fromNative (pushStack st (__FILE__ ++ ":" ++ (show __LINE__) ++ "(fromNative)")) a2<br />
     res &lt;- lexSourceStringWithExt a1p a2p<br />
     toNative st res<br />

And the exact line is

<br />
a2p &lt;- fromNative (pushStack st (__FILE__ ++ ":" ++ (show __LINE__) ++ "(fromNative)")) a2<br />

This is interesting for a couple of reasons., The profiler is saying that the pointer associated with the CWString (which is defined as a Ptr CWchar) is never freed. But why not.

The answer lies in the C# Marshaller and types being used. We are currently Marshalling C# strings using the String datatype. Strings in C# are immutable, so once the marshaller creates a wchar_t* from the string, it never worries about it again. They are strictly an in parameter.

There are two ways to solve this:

  • C# does have mutable strings, using the StringBuffer type. Using StringBuffer has the benefit that it already is implemented as a char pointer. The marshaller simply passes the pointer to the Haskell function and upon. After the function returns and the GC determines that the StringBuffer is no longer in use, it should free the memory (at least in theory).
  • Make the library just free any CWString it dereferences.

For now I’ve chosen the second approach, for no other reason other than it requires the least amount of change in existing code. In the future I’ll adopt the approach that Hs2lib will free any arguments being passed to it. I don’t know if this is the convention usually used. If someone has something against this approach I would love to hear it.

Update : There’s somewhat of a big gotcha that I’ve recently discovered. We have to remember that the String type in .NET is immutable. So when the marshaller sends it to out Haskell code, the CWString we get there is a copy of the original. We have to free this. When GC is performed in C# it won’t affect the the CWString, which is a copy.

The problem however is that when we free it in the Haskell code we can’t use freeCWString. The pointer was not allocated with C (msvcrt.dll)’s alloc. There are three ways (that I know of) to solve this.

  • use char* in your C# code instead of String when calling a Haskell function. You then have the pointer to free when you call returns, or initialize the pointer using fixed.
  • import CoTaskMemFree in Haskell and free the pointer in Haskell
  • use StringBuilder instead of String. I’m not entirely sure about this one, but the idea is that since StringBuilder is implemented as a native pointer, the Marshaller just passes this pointer to your Haskell code (which can also update it btw). When GC is performed after the call returns, the StringBuilder should be freed.

I’ve once again updated the library to not free any pointers. This prevents a nasty heap corruption. To not get memory leaks in c# you are free to choose between solution 1 & 3 presented here.

Results

With this new code in place, I once again run the profiled lexer to generate a dump. This time when I analyze the result however I get

*** Program starting up…

*** Reading file ‘.\MemDumps\Memory.dump’…

*** Found 33027 record(s).

*** Analyzing [********************] 100.00%

*** Found 0 outstanding allocation(s).

Congratulations, No memory leak(s) detected.

*** Cleaning up….

And so the memory leaks are fixed. It’s worth noting that the analyzers can be quite slow. It uses a flat LinkedList like structure and lists to do the analysis. I will in future versions be replacing these with a Tree like structure and arrays respectively.

Hs2lib: Automating compilation of dynamic libs


As most of you know Visual Haskell makes use of Haskell programs compiled to a dynamic library (dll).  This task can be completely automated. This is what Hs2lib provides. Hs2lib (formerly WinDll) was made to facilitate making changes and updates to Visual Haskell’s support files. It was designed for purely that purpose and thus has some limitations and design decisions that reflect this. I’ve been working mostly on getting this to a stable state the last few weeks/months. The one used in the original Visual Haskell used a hack for some things like lists. This one has a stable interface.

get it from Hackage with

cabal install hs2lib

using it is quite simple, I’ll illustrate with an example:

module Arith where

-- @@ Export
summerize :: [Int] -> [Int] -> Int
summerize x y = sum (x ++ y)

-- @@ Export
single :: Int -> [Int]
single x = [1..x]

Hs2lib does not export all functions automatically. you have to mark them with a special annotation “--@@ Export [=optional_name]”

if no name is specified, the function name is used. As a note, only functions with an explicit type signature can be exported. This tool works by static analysis. To compile a simple invocation to hs2lib is sufficient

PS C:\Examples> hs2lib Arith
Linking main.exe …
Done.

After this there should be 2 header files and one dll file named Hs2lib.dll in the folder. This name can be changed with the –n flag. Your Haddock documentation is carried over along with the original function type as comments for the new generated functions in the headers. Marshaling data is automatically generated for you for any data structures found during the dependency analysis phase. This can only be done if the source for the data type is available.

An example of how to call this function via C is

#include <stdio.h>
#include <stdlib.h>
#include "Hs2lib_FFI.h"

int main(int argc, char** argv) {

    HsStart();

    int* foo = (int*)malloc(sizeof(int)*3);
    foo[0] = 2;
    foo[1] = 5;
    foo[2] = 10;

    printf("Sum result: %d\n", summerize(3, foo, 3, foo));

    free(foo);

    int count;
    int* bar = single(8, &count);

    printf("Single result: (%d)\n", count);

    int i;
    for(i = 0; i < count; i++)
        printf("\t%d\n", bar[i]);

    free(bar);

    HsEnd();

    return (EXIT_SUCCESS);
}

Since Visual Haskell is written in C# this tool can also output C# code, this is done by using the –c# flag. I’m working on a PDF that goes into much greater detail about this tool and it’s options, But I need to reword some parts of it as I’ve been told it’s too “handholdy” (simplistic) in some parts. So I’ll hold off on publishing that.

But here are the Highlight of this tool

Currently Supported:

  • Generates Marshalling information for abitrary datatypes
  • Types with kind other then * have to be fully applied. (e.g. Maybe String). A specialized Marshaling instance is then generated for each of these instances.
    • Generates FFI exports for any function you mark, with FFI compatible type
    • Properly supports marshaling of lists. [Int], where the list is an argument becomes CInt –> Ptr Int , getting an explicit size for the list, and where a return value becomes Ptr CInt –> Ptr Int, where it expects a pointer to which to write the size of the list. This introduces a semantic difference between [Char] and String. The former is treated as a list of characters with no explicit terminator, whereas the latter it treated as a Null terminated wide string.
    • Supports Ptr, FunPtr and StablePtr. Which introduces the possibility to have “caches”.
    • Supports Callbacks via Higherorder functions (everything is autogenerated from the function type)
    • Data types with multiple constructors become a union, where data types with a single constructor or a newtype are inlined and treated equally.
    • Re-exports existing exports found in source file
    • Honors existing Storable instances found for types it needs
    • Avoids unsafePerformIO as much as possible, except for in one specific instance.
    • Generates Initialization functions for you
    • Hides unnecessary exports using a DEF file
    • Allows you to override my default type conversions (e.g. String –> CWString)
    • Provides helper includes for C, C++ and C# (placed into wherever cabal places extra includes. %AppData%\cabal\Hs2lib-0.4.8 for windows)
    • And more

Limitations:

  • Does not support automatic expansion of lists in Applied types other then IO (e.g. Maybe [Int])
    • Does not support infix constructors (Arbitrary codegen limit, I’ve never needed it. Will add in future versions)
    • Code generator generates a bit too many parenthesis for the temporary Haskell file generated. Will fix in future versions.
    • Does not support polymorphic functions (this is a FFI limit, cannot know the size of a polymorphic value.)
    • Cannot export functions or types with the same name. The types are imported unqualified.

I will work  on the pdf documentation in the coming 2 weeks, That should be a complete overview on the abilities of this tool and why I made certain decisions.

To verify that the install (and the package is complete) , there are a set of tests included in the tar. just unpack it, and ghci to Tests.TestRunner and run the function “runLocalTests”. The cabal test interface I was using before got deprecated. I’ll have to look up how the new ones work, until then, sorry about that Smile

I will publish a more detailed tutorial later during the day.